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Figure captions 

Fig.l a dependence of several quantities in tlie RS solution for 5 = 1 and T = 1. 
(a) q and i?(dotted curve) (b) Srs (c) Ai and A3 (dotted curve) 

Fig.2 a dependence of q and R (dotted curve) in the RS solution for 5 = 0. 
(a) T = 5 (b) T = 0.5 

Fig. 3 a dependence of the entropy in the RS solution for 5 = 0. Solid and dashed 
curves denote S^^ and S'^^, respectively, 
(a) T = 5 (b) T = 0.5 

Fig.4 a dependence of Ai and A3 (dotted curve) in the RS solution for 5 = 0. A( 
and A3 correspond to curves starting from -1 for small a. 
{a)T = 5 (b) T = 0.5 

Fig.5 T dependence of q{a) and -R(a) (dotted curve) in the RS solution for T = 5. 
(a) 5 = 0.3 (b) 6 = 1.5 

Fig. 6 a dependence of Tc for 5 = 1. 

Fig. 7 In a v.s. In Ae^ A line segment with estimated gradient is depicted togehter. 

Fig.8 a dependence of Tc for 5 = 0. Solid curve: IRSB(I), dashed curve: IRSB(II). 

Fig. 9 a dependence of Free energy. /^^.^ (solid curve) and fj^g^ ( dashed curve) 
Fig. 10 a dependence of R and 6R in the minimum-error algorithm for 5 = 1. Dotted 

curve: = 10, Dashed curve: N = 15, Solid curve: N = 17. 

(a) R (b) 6R 

Fig. 11 a dependence of several quantites in the minimum-error algorithm for 5=1. 
-|-: numerical results for = 17, bars indicate standard deviations. Dashed curve: RS 
solution. Solid curve: IRSB solution 
(a) R (b) g(RS) and go(lRSB) (c) Ae^ 

Fig. 12 Asymptotic behavior of R in the minimum-error algorithm for 5 = 1. +: 
numerical results for A^ = 15, bars indicate standard deviations. Dashed curve: RS 
solution. Solid curve: IRSB solution 
(a) < a< 15 (b)15 < a < 50 (c)50 < a < 100 

Fig. 13 a dependence of R and Ae^ in the Gibbs algorithm for 5 = 1. +: numerical 
results for A^ = 12 with T = 1, dashed curve: RS solution with T = 0, dotted curve: 
RS solution with T = 1, solid curve: IRSB solution 
(a) R (b) A6, 

Fig. 14 a dependence of q and go iii the Gibbs algorithm for 5 = 1. +: numerical 
results with standard deviations for A^ = 12, dashed curve q: RS solution with T = 0, 
dotted curve q: RS solution with finite temperature, solid curve go :1RSB solution 
(a) T = 0.15, (b) T = 0.5 (c) T = 5.0 

Fig. 15 T dependence of -P(g) in the Gibbs algorithm for 5 = 1. Histgram: numerical 
results for A^ = 12 and p = 60, solid line: IRSB solution, dotted line: RS solution, 
(a) T = 0.15 (b) T = 0.5 (c) T = 5.0 

Fig. 16 a dependence of several quantites in the minimum-error algorithm for 5 = 0. 
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+: numerical results for N = 17, bars indicate standard deviations, Dashed curve: 
RS(T = 0), solid curve: IRSB(I), dashed curve: IRSB(II). 
(a) R (b) g(RS) and go(lRSB) (c) Ae^ 

Fig. 17 Asymptotic behavior of R in the minimum-error algorithm for 6 = 0. +: 
numerical results for = 15, bars indicate standard deviations. Dashed curve: RS 
solution. Solid curve: IRSB solution 

Fig. 18 a dependence of R and Ae^ in the Gibbs algorithm for 5 = 1 and T = 1.0. 
+: numerical results for N = 12. dashed curve: RS solution with T = 0, dotted curve: 
RS solution for T = 1.0. solid curve: IRSB solution 
(a) R (b) Ae, 

Fig. 19 a dependence of q and go in the Gibbs algorithm for 6 = 0. +: numerical 
results with standard deviations for = 12, dashed curve: RS solution with T = 0, 
dotted curve: RS solution with finite temperature, solid curve: IRSB solution 
(a) T = 0.15 (b) T = 0.5 (c) T = 5.0 

Fig. 20 T dependence of P{q) in the Gibbs algorithm for 6 = 0. Tc'^ 0.7. Histgram 
: numerical results for A^ = 12 and p = 60, solid hue: IRSB solution, dotted hue: RS 
solution. 

(a) T = 0.15 (b) T = 0.5 (c) T = 5.0 
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Figure 1. Fig.l a dependence of several quantities in the RS solution for 5 — \ and 
T = 1. 
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Figure 2. Fig. 1(a) q and i?(dotted curve) 
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Figure 3. Fig.l (b) Srs 
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Figure 4. Fig.l (c) Ai and A3 (dotted curve) 



Figure 5. Fig. 2 a dependence of q and R (dotted curve) in the RS solution for (5 = 0. 
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Figure 6. Fig.2(a) T = 5 
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Figure 7. Fig.2(b) T = 0.5 
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Figure 8. Fig. 3 a dependence of the entropy in the RS solution for 5 = 0. Solid and 
dashed curves denote S'jj^ and S'^.g, respectively. 
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Figure 9. Fig.3(a) T = 5 
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Figure 10. Fig.3(b) T = 0.5 



Figure 11. Fig. 4 a dependence of Ai and A3 (dotted curve) in the RS solution for 
(5 = 0. A{ and A3 correspond to curves starting from -1 for small a. 
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Figure 12. Fig.4(a) T = 5 
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Figure 13. Fig.4(b) T = 0.5 
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Figure 14. Fig. 5 a dependence of q and i?(dotted curve) in the RS solution for T = 5. 
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Figure 15. Fig.5(a)5 = 0.3 
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Figure 16. Fig.5(b) 5 = 1.5 
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Figure 17. Fig. 6 a dependence of for 5 = 1. 




Figure 18. Fig. 7 In a v.s. InAe^ A line segment with estimated gradient is depicted 
togehter. 




Figure 19. Fig. 8 a dependence of Tg for (5 = 0. Solid curve: IRSB(I), dashed curve: 
IRSB(II). 
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Figure 20. Fig. 9 a dependence of Free energy. /^^^ (solid curve) and fUg^ ( dashed 
curve) . 



Figure 21. Fig. 10 a dependence of R and 8R in the minimum-error algorithm for 
5 = 1. Dotted curve: N = 10, Dashed curve: = 15, Solid curve: N = 17. 
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Figure 22. Fig. 10(a) R 
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Figure 23. Fig. 10 (b) 5R 
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Figure 24. Fig. 11 a dependence of several quantites in the minimum-error algorithm 

for 5=1. +: numerical results for = 17, bars indicate standard deviations, Dashed 

curve: RS solution, Solid curve: IRSB solution 
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Figure 25. Fig. 11 (a) R 
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Figure 26. Fig.ll (b) q(RS) and go(lRSB) 
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Figure 27. Fig.ll (c) Ae^ 



6 



Figure 28. Fig. 12 Asymptotic behavior of R in the minimum-error algorithm for 
6=1. +: numerical results for TV = 15, bars indicate standard deviations, Dashed 
curve: RS solution. Solid curve: IRSB solution 
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Figure 29. Fig.l2(a) < q < 15 
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Figure 30. Fig.l2(b) 15 < a < 50 
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Figure 31. Fig.l2 (c) 50 < a < 100 
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Figure 32. Fig. 13 a dependence of R and Ae^ in the Gibbs algorithm for 6=1. 

+: numerical results for = 12 with T — 1, dashed curve: RS solution with T = 0, 
dotted curve: RS solution with T = 1, sohd curve: IRSB solution 
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Figure 33. Fig. 13(a) R 
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Figure 34. Fig.l3(b) Ae,, 
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Figure 35. Fig. 14 a dependence of q and go in the Gibbs algorithm for (5 = 1. +: 

numerical results with standard deviations for TV = 12, dashed curve q: RS solution 
with T = 0, dotted curve g: RS solution with finite temperature, solid curve go :1RSB 
solution 
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Figure 36. Fig. 14(a) T = 0.15 
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Figure 37. Fig. 14(b) T = 0.5 
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Figure 38. Fig. 14(c) T = 5.0 
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Figure 39. Fig. 15 T dependence of P{q) in the Gibbs algorithm for 5 = 1. Histgram: 
numerical results for TV = 12 and p = 60, solid line: IRSB solution, dotted line: RS 
solution. 
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Figure 40. Fig.l5(a) T = 0.15 




Figure 41. Fig.l5(b) T = 0.5 




Figure 42. Fig. 15(c) T = 5.0 
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Figure 1. Fig. 16 a dependence of several quantitcs in the minimum-error algorithm 
for S ^ 0. +: numerical results for N = 17, bars indicate standard deviations, Dashed 
curve: RS(r = 0), solid curve: IRSB(I), dashed curve: IRSB(II) 
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Figure 2. Fig.l6(a) R 
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Figure 3. Fig.l6 (b) q(RS) and go(lRSB) 
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Figure 4. Fig. 16 (c) Ae^ 



Figure 5. Fig. 17 Asymptotic behavior of R in the minimum-error algorithm for (5 = 0. 
+: numerical results for N = 15, bars indicate standard deviations. Dashed curve: RS 
solution. Solid curve: IRSB solution 
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Figure 6. Fig. 18 a dependence of R and Ae^ in the Gibbs algorithm for 5 = 1 and 
T = 1.0. +: numerical results for = 12. dashed curve: RS solution with T = 0, 
dotted curve: RS solution for T = 1.0. solid curve: IRSB solution 
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Figure 7. Fig.l8(a) R 
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Figure 8. Fig. 18(b) Ae^ 





Figure 9. Fig. 19 a dependence of q and in the Gibbs algorithm for (5 = 0. -|-: 
numerical results with standard deviations for N = 12, dashed curve: RS solution with 
r = 0, dotted curve: RS solution with finite temperature, solid curve: IRSB solution 
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Figure 10. Fig.l9(a) T = 0.15 
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Figure 11. Fig.l9(b) T = 0.5 
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Figure 12. Fig.l9(c) T = 5.0 
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Figure 13. Fig.20 T dependence of P{q) in the Gibbs algorithm for 6 = 0. T^c^ 0.7. 
Histgram : numerical results for A'' = 12 and p = 60, solid line: IRSB solution, dotted 
line: RS solution 
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Figure 15. Fig.20(b) T = 0.5 
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Figure 16. Fig.20(c) T = 5.0 
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1. Introduction 

In the problem of learning from examples by feed forward networks, learning curves of 
the generalization error eg have been calculated for various types of networks From 
these studies, it turned out that when the number of examples p is large relative to 
the number of synaptic weights A^, that is, when a = p/N is large, the learning curves 
exhibit only a few types of behaviours For example, learning curves of networks 

with continuous weights all exhibit power laws 

where 7 depends on architectures, types of weight vectors and so on. 

On the other hand, in the learning behaviors for the case of discrete weights, in 
addition to the power laws it was shown that there exists the Perfect Learning(PL) for 



deterministic and realizable cases [^, |TT|. That is, learners' weight vectors coincide 
to the teacher's weight vector at a finite a. Then, it is very interesting to clarify the 
existence conditions for the PL in the case of discrete weights and under the presense 
of exernal noise. 



In the previous paper |T^, we reported about these conditions. The results are 



similar to those by Seung [|T^ who classified the learning behaviours of Ising networks 
by introducing two exponents y and z. We gave the other meaning of y and z and 
obtained the relation y = 2z. Further, asymptotic behaviours of learning curve were 
also investigated. 

The purpose of this paper is to give the detailed derivation on the conditions for 
the existence of the Perfect Learning and the asymptotic behaviours of learning curves 
in the problem of learning from stochastic examples by perceptrons with Ising weights 
by using the replica method. 

In the following section, we formulate the problem. In §3, we analyse the replica 
symmetric(RS) solution. The conditions for the existence of the PL is given in §4. The 
one-step replica symmetry breaking (IRSB) solution is studied in §5. The results of 
numerical calculations are given in §6. §7 is devoted to summary and discussion. 

2. Formulation 

We consider a stochastic target relation between iV-dimensional input vector x and 
binary output r G {1,-1} which is represented by a conditional probability pr{r\x). It 
is assumed that an input vector x is normalized as | cc | = \fN and p^{r\x) is a function 
of the inner product between the input x and the optimal Ising weight w° as 

M+l|a.)=PK) = ^^, (1) 
u" ={x-w")/Vn. 
We further assume that the function P{u) is not decreasing w.r.t. u and behaves as 

P(m) ~ asgn(M)|M|'', (5>0), (2) 
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near -u = 0. Further, P{—u) = —P{u) is assumed for brevity. 

The case of 5 = corresponds to the output noise model in which the output of the 
target perceptron is reversed to the opposite by noise with a probabihty. On the other 
hand, 5 = 1 corresponds to the input noise model |T0| in which the input of the target 
perceptron is corrupted by Gaussian noise with mean zero. 

We assume that a set of p examples = {{x^, r°), {x^, rj), {x^, r°)} is obtained 
as follows, x^^ is independently and uniformly drawn from a hyper sphere of radius 

at the origin in the iV-dimensional space and r° is obtained with the conditional 
probability pr(r°|a;^) for each x^^. For the given realization of examples ^p, the number 
of false predictions is given as 

E[w,Q = E e(-rX), u, ^ (x^ ■ w)/Vn, (3) 

where B(x) = 1 for x > and B(x) = for x < 0. The performance of the learning is 
evaluated by the generalization error eg. This is expressed as 

eg = <V{u"){l-Q{u)) + {l-V{u"))Q{u)> (4) 

Ry ^ 



+ 2 DyP{y)H{- 



DyP{y), 



_ 1 

where < ■ ■ ■ > represents the average over a novel example and e^m is the minimum 
value of the generalization error obtained by the optimal weight w°. R is the overlap 
between the optimal weight vector and a weight vector of a learner, R = {w° ■ w)/N. 
Further, as usual, Dy = exp(— ?/^/2)(i?//^/27r and H{x) = Dy. In particular, when 
AR = 1 — i? is small, we obtain the relation 

Ae, = (e^ - e™„) ~ -— ^-=(2Ai?)^, (5) 
(1 + djy/ Ztt 

where s = a /g°° Dyy^^^. 

In this paper, we adopt the Gibbs algorithm with temperature T as a learning 
algorithm. The minimum-error algorithm, which minimizes the number of false 
predictions on the presented examples, is obtained by taking T +0 limit. 

From the energy defined by the equation (^) the partition function Z with the 
inverse temperature (3 is given by 

Z = Tr^e-^^['^'«''l = Tr^n^=i[e-^ + (1 - e~^)Q{r,u,)], 

where Tr^o implies the summation over all configurations of w. The average free energy 
/ per weight is calculated by the standard recipe 

-/3Nf =< \nZ >5^,^.= lim -(< >^^^^o -1), 

where < ■ ■ ■ >^p,w° denotes the average over quenched variables. 

< Z^ >ip,w° becomes a function of several replica order parameters, namely the 
overlap between weight vectors of learners g"^ = n^'^\ its conjugate g"^, the overlap 
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between the weight vector of a learner and the optimal weight vector i?" = '■^ jv^"^ > 
and its conjugate See Appendix A for a derivation of the free energy. 

3. RS solution 

Let us cosider the RS solution. For the RS solution, any quantity does not depend on 
the rephca indices and we put q"'^ — q, q"'^ — q,R°' — R and R°' — R. Then, the RS 
free energy fas becomes 

- Pfnsiq, q, R, R, P) = -^{1 - q) - RR + aK + I, (6) 
K = I Dy2V{y) J DuhxHC^^^ ^ll^^ ) = j Du\nH{u/Q)E{u/Q)i7) 
I = j Dt\n[2cosh{^qt + R)], (8) 
E{u) = j Dy2Vi-A) = 1 - e'^'/' Dyie-^'y - e''y)P{iy), (9) 
H{u) = e-" + (1 - e-'')H{u), A ^ + ^/l^Qu = Civ - v), 
^ = \/l , Q = \ ,v = — —u, x = j^- 

3.1. Saddle point equations (S. P. E.) 
The saddle point equations are given by 



q 



J DutSinh^(^u + R), (10) 



R= J Dutanh{^qu + R), (11) 
q^^f Du{<p{u)fE{u), (12) 



1-q 

^ = - 7=lf / ^^'^^^^ / Dyy^ViA) = -^7=^ / Du^{u)w{u), (13) 



WiUj 



J DyyP{K) = e-^'l'' DyP{iy) \{y + t;)e-^^ ^ {y - t;)e^^I,14) 



i\ du _o2„2/2 s H'(u) 
Du = —j=e ^ '\ (p{u) = — 



For later use, we give the expression of the entropy Srs, 

Srs = - PfRS - a^e-^J, (15) 

J^j Dy2V{y) j Du \ ^ ' . 

Defining L as L — K — (3e~^J, Srs becomes 

Srs ^~{l-q)-RR + I + aL, (16) 
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where L is expressed as 




H{u/Q) 
H{u/Q) 



}■ 



(17) 



Also, the energy (training error) per weight is expressed as 
Ct — —ae~^J. 



(18) 



3.2. Numerecal calculations of S.P.E. for the RS solution 




a dependence of q, R, Srs, and Ai and A3 which are indicators of AT-stability. The RS 
solution is stable only when both Ai and A3 are negative. From the numerical results, 
it seems that as o; — 00, g and R tend to 1. In this case, the entropy Srs becoms zero 
at some value of a, Q!s(T), and A3 becomes zero at another value of a, aAT{T), for any 
T. 



For the numerical calculations, we treated P{y) — \ sgn (|/), in which tmin = \- 
In Figures 2-4, for several temperatures we depict a dependence of g, i?, Srs, Ai and 
A3. The most interesting feature is that there is no solutions in which q and R tend 
to 1 as a; ^ 00. There are two branches of solutions. We call them the branch I and 
the branch II. Each solution is characterized by the behaviour in the limit of a ^ 0. In 
the branch I, q and R tend to 0. On the other hand, in the branch II, q and R tend 
to 1. We attach the superscript I or II to any quantity estimated in the branch I or II, 
respectively. From these figures, we note that when T is greater than some temperature, 
say Ts, solutions in the both branches are AT-stable and their entropies are positive. 
When T < Tg, the entropy of the branch II, 5"^^, becomes negative for any a. Thus, 
Tg is determined by the condition that S^^s changes its sign at small value of a. There 
exists a critical value oi a — as{T) at which the entropy of the branch I, S^sj becomes 
0. Then, Sj^s > for a < a,(T) and Sj^s < for a > as(T). Also, we note that for the 
branch I, the AT-inst ability takes place at a = Q;^r(T) when T is smaller than some 
temperature, say, Tat{< Tg). For T < Tat, A3 is positive for a > aAriT), whereas Ag^ 
is positive for any a. On the other hand, A( and A(^ are always negative for any T and 
a. 

(Ill) general S 

The numerical calculations were performed for several values of S and T. For ex- 
ample, when T — 5 and 5 — 0.3, q and it! tend to 1 as o; — > 0. See Figure 5(a). In this 
case, the RS solution is AT stable and its entropy is positive. There exists the other 
case in which q and R tend to 1 as a 00. See Figure 5(b). In this case, Srs decreases 
and becomes at finite a, as{T), for any T. A3 also becomes at finite a, aAT{T). 



(II) s = o 
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Within our numerical calculations, for any value of 5 and for T < Tat, we obtain 
the relation aAriT) > as(T), and aAriT) and as(T) are increasing as T increases, as 
long as these quantities are defined. Further, except for the case of 5 = 0, aAriT) and 
as(T) increase as S increases. 

For any value of 6, the entropy becomes negative for small T. Thus, in this case 
we have to consider the replica symmetry breaking ansatz. 

3.3. Asymptotic relations for q and R when g — 1 and R ^ 1 

In this section, in order to derive the asymptotic learning curves and to discuss the 
conditions for the existence of the PL, we summarize asymptotic relations for q and R 
when q ^ 1 and i? ^ 1 for any values of a, /5 and x by evaluating equations (|T^) and 



( |T3|) under these limits. See Appendix B for the derivation. 
q and R are estimated as follows. For < f3 < oo 



giAx,f3) for 5>0, (19) 



For P = oo, 



i?~a^(72,5(x,/3)for5>0. (20) 



gs for 5 > 0, or for 5 = and k < 1, (21) 



(Ag)2 

-g3,D for deterministic case, (22) 



Aq 71 



R ~ -T—g^ when P(y) is not constant for y > 0, (23) 
Aq 

when P{y) = k for y > and k < 1, (24) 



q ^ nr— Qs D for deterministic case. (25) 



Expressions for gs are given as follows. 



Esiu, x) = 1 for 5 > 0, Eo(u, x) = I - k + 2kH{u/x), 



g2Ax,P)^^il-e-^)-{i + x-'f-'^^' 

V2n X 



oo roo 1 

5-1 ' 



X / Dzz'-' / Dt\^ , + ^ , ], for 5>0, 

/•oo 2 /■ h(u) 

^00 

94 = / /^yP(y)(y'-l). 
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Now, let us see the behaviours of ^^s for later use. gi^six^ P) is finite for < /9 < oo and 
for 5 > for any since H{x) is bounded. As for 5^2,5 (x, P), we obtain for < < 00 
and for S > 0, 

92Ax,P)^J^oX~', forx<l, (26) 
~ iyiX-\ for X > 1, (27) 
~ finite for finite x, (28) 
where i/q = ^^Z^ 1 + 1^} and i/i = In the case of 5 = 0, for 

" V27r H(z) H{—z)-' \/27r ' 

< /3 < 00, g2fl is finite except for x ^ 1, that is, 
2k 

g2,o tanh(/?/2) for X « 1, (29) 

TT 

~ ^X-' for X » 1, (30) 

TT 

~ finite for finite x- (31) 

On the other hand, quantities for f3 = 00, 5'3,5'3,d and g^ are all finite. 

Here, for later use, we give the asymptotic form of the entropy Srs- L is evaluated 

as 



^Agr(x,/3), (32) 
ix. P)^^ f duEsiu, x){ln[l + (e^ - l)H(u)] - (3^\. (33) 



r{x, (3) is finite for < f3 < 00 and for any 6 and any x- Then, as g — >■ 1 and i? — >■ 1, 
for < /5 < 00 and for any 6 and any x, 'S'ij5 is expressed as follows. 

Srs = - (1 - Ai?)^ + r(x, /3) + /■ (34) 

3.4- Asymptotic solutions of S.P.E. when 1 and R^l. 

The equations for Ag = 1 — g and R are obtained by 

dl „ dl 
Aq = 2—, R=^. 
dq dR 

Thus, we have to estimate / in the asymptotic region. I behaves differently according 
to the values of = ^. We give the expressions for I in Appendix C. 

After several algebra, we obtain the following asymptotic behaviours. 



(i) 5 > \. 

For q; > 1, 



, In a , 25 5 , 25 

a go(3d — 1) 

,ln a, 2(1+^) „ 1/5, Ina, 
Aqc^qopio' ARc^Roli^i ; 

a a 



On the conditions for the existence of Perfect Learning 

(ii) S = 1. 

For a:$> 1, 

12'- 1 1 



(iii) < 5 < |. 
For q; < 1, 



In — 2S 6 - 2,5 

q; go(l — 3()j 

1+i In — 2(1+5) 1 /X In — 2 

a a 

-l+i. 4S 1 . 1+S - - -lizi 2S 1 . 1-5 

g ~ goA*o « i-3^(ln-)i-3'5, i2 ~ i?oA*o « i-s^ (In — ) i-3« . 

q; q; 



(iv) 5 = 0. 

(a) <//<!. 
For a < 1, 

Ai? ~ -^ij(fi)a\\n -r^, Aq = 2/xAi?, 
V27r o; 

12 a a Zjj, 
X and n are determined by the following equations. 

2^i,o(x) 1 + 



(35) 



(b) 1 < fi. 
For a < 1, 



. = (36) 



1 1 

Ai? ~ 2a.(-ln-)-^ Aq = 2AR, 
a a 



^ 2// 1 1 R 

R ~- -In -In-, q=—. 

2/1 — 1 a a 2/1 



/I is determined by 



^2,o(x = 0,/3) 



Now, let us check the validity of the above solutions. First, let us see the case of 
S > 0. We only have to see the conditions for <^ 1, AR <^ 1 and R^ 1. For 5 > 1/3, 
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the conditions are 



g-,P'^^^ (Ai?«l), 

a 



,2(5-1)^2 

^ — 

These are satisfied if a ^ 1. For 5 = 1/3, the condition is go > and automatically 
satisfied for /5 > 0. For < 5 < 1/3, the conditions are 



a 

5iJ/?'«!^ (Ai?«l), 
a 

These are satisfied if a ^ 1. Therefore, no extra condition is necessary for the case of 
(5 > 0. On the other hand, for 5 = 0, in the case of (a) the condition is that there is a 
positive solution x of equation (|35|), and in the case of (b) the condition is yU > 1, where 
/i is defined by equation (^). When /5 -C 1, /i is estimated as 

\x ~ ^ = for the case (a) 

~ for the case (b). 

P 

Thus, the case (a) is impossible for high temperatures. On the other hand, when f3 ^ 1, 
since gi^ becomes very large and 5^2,0 remains bounded in the both cases, we obtain 
/i -C 1 . Therefore, the case (b) is impossible for low temperatures. 

The results obtained in this section suggest that for 5 < 1/3 the PL exists and the 
solution with g < 1 exists only in the finite region of a, and for 5 > 1/3 the PL does not 
exist and the solution with g < 1 exists for any a. However, as is shown in the next two 
sections, this is not correct. One reason is that the entropy of the RS solution becomes 
negative for T — when a can be large enough. The other reason is that the condition 
5 < 1/3 is different from the existence condition for the PL. 

In the next section, we investigate the necessary and sufficient conditions for the 
existence of the Perfect Learning. 

4. Perfect Learning 

In the Perfect Learning, the weight vector of a student coincides with the optimal weight 
vector at a finite value of a, w = w°. In this case, g = 1 and R = 1. From equation 



(110) and (|Tl|) the necessary and sufficient conditions for g = 1 and i? = 1 at finite value 
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of a are 

R 

R ^ oo and r = oo. (38) 

VQ 

In the case of the PL, we impose the further condition q = R when the hmits q ^ 1 and 
— »• 1 are taken, because in the Perfect Learning, the teacher and a student coincide. 
Therefore, we have x = \J '^i^q ~ ~ ^ then ^ = Q = v^Ag. Thus, for < /3 < cxo 
we obtain from equations ([T9|) , and (pOD , 

a 

q =-^g,^,{x = hP), (39) 
^ = a(Ag)(^-i)/V(x = l,/3). (40) 



For P = oo, 



a 



-5f3 for 5 > or 5 = and k < 1, (41) 



{Aqr 

a 

-93,0 for deterministic case, (42) 



Aq 71 



ex 

R ~ -^—94: when P(?/) is not constant for y > 0, (43) 
Aq 



oc aJ Aq when P(?/) = /c for ?/ > and A; < 1, (44) 



a 

= q ^ 93 D for deterministic case. (45) 
VAq ' 

Let us see what are derived from these conditions. First, let us consider the case of 
< P < oo. In this case, both of 9i,s{x = 1?/^) and 92,s{x = '^i P) finite for 5 > 0. 
See subsection 3.3. Thus, from ([39| ) and (^OD, the conditions for g — oo, _R — > oo and 
r — > oo as g 1 for any a are derived as 

q ^ oo for any 5 > and < /? < oo, (46) 
^ ^ oo for < 5 < 1 and < /3 < oo, (47) 
r ^ aV2(Ag)(2^-i)/4^ 



-> OO for < (5 < 1/2 and < /? < OO. (48) 

Hence, the condition for the PL is < 5 < 1/2. Next, let us consider the case of /5 = oo. 
When P{y) is not constant for y > 0, gs are finite. Then, from (^) and (|i3|), we obtain 
that both q and R tend to infinity and r becomes r ~ ^J^94: and is finite. Thus, in 
this case the PL does not exist. On the other hand, when P{y) = k < 1 for y > 0, 
R oc a^/Aq tends to 0. Thus, the PL does not exist. Finally, in the deterministic case, 
from (^) and (^5D, q = R and r = ^/q tend to infinity since 93 £, is finite. Hence, the 
PL exists. 

Therefore, summarizing the above results, we conclude that the PL exists for 
< 5 < 1/2 and < /5 < 00, and for the deterministic case. 

As for the entropy SpL and the free energy fp^ for the PL, we obtain the following 
reasonable results, 

SpL = 0, fpL = Oiemin- 
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See Appendix D. 
5. IRSB solution 

Although we adopt the Gibbs algorithm as the learning strategy, we also have interest 
in the minimum-error algorithm. In the minimum-error algorithm we have to choose 
weights with minimum errors, and on that account we only have to take the limit 
T — > +0. However, as is shown in §3 by numerical calculations, in the RS solution, 
the entropy becomes negative for small T . Thus, we have to consider the breaking of 
the replica symmetry |M. In the one-step RSB solution, the matrix g"^ is divided into 



(n/m)^ small matrices with the dimension mxm. The components of each off-diagonal 
matrix are all go and the components of each diagonal matrix are qi except for diagonal 
components with the value 0. Likewise, go and gi are defined for the matrix g°^. Further, 
R°' = R and R"" = R are assumed. Then, the one-step RSB free energy fiRSB is derived 
to be 

-pfiRSBiqo, qi, 4i, R, /3) 

= - f (1 - qi) + y (gogo - mi) - RR (49) 



+ -J Dy2V{y) j Dz,\n j Dz,[H{ ^=== )] 

+ DzolnJ Dzi[2cosh{^ozo + ^ qi - %Zi + R)\^. 



Further, according to Krauth-Mezard fl^, we take the limits gi — * 1 and qi oo. 
Then, we obtain 

fiRSsiqo, go, qi = 1, qi = oo, R, R, m, p) = fnsiqo, m^go, R, mR, j3m). (50) 

From this relation, the equations for qo,qQ,R,R and m become the coupled equations 
of the saddle point equations for the RS solution and the equation of Srs = 0, where 
Srs is the entropy for the RS solution. Let us denote the solutions of these coupled 
equations by g = gc, g = gc, R = Rc, R = Rc and /3 = Pc- Then, the one-step RSB 
solutions are expressed by go = gc, go = {-p-Yqc, R = Rc, R = -p-Rc and m = ^. Thus, 
to obtain the T — + +0 limit we only have to know the solution at T = = l3~^ . 



5.1. Numerical calculation of S.P.E. for the IRSB solution 
(I) 5 > 

As a special case, we treated P{y) = l — 2H{y), that is the case of 5 = 1. This is the 
same function as that calulated for the RS solution. In Figure 6, we show a dependence 
of Tc. See Figure 11 for a dependence of go, -R and Ae^. As is seen from these figures, 
IRSB solution seems to extend to a = oo. To study asymptotic behaviors, assuming the 
following relations for several quantities, we estimated the coefficients and exponents 
bi by the least square methods. That is, for Ag, AR, Aeg and Tc we assume 

In A = ai + bi In a, 
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Table 1. Coefficient Oj and expontent bi extimated for 200 < a < 381 and theoretical 

value &2,t/f 

Aq AR Tc Aeg R q 
ai 2.5 2.2 -1.8 1.1 -0.49 0.89 
bi -1.5 -1.5 0.81 -1.5 1.1 0.28 
02 1.1 0.88 -1.1 -0.27 0.50 1.1 
62 -1.8 -1.9 0.99 -1.9 1.4 0.35 
b2,th -2-2 1-2 1 1 



\nA = a2 + 62ln(- ), 

mo; 

and for q and R 

A — Qi + biln a, 

A = 02 + 62111(7^). 

ma 

In the Table I, we give the list of and bi for these quantities. In particular, we note 
that Tc —> 00 as q; —> 00. Further, we obtained ~ 1.7 and ^ ~ 2.5. As an example, 
we show the asymptotic behaviour of Acg in Figure 7. 

(II) 6 = 

We treated the same function P{y) as in the calculation for the RS solution, 
P{y) = \ sgn {y). We depict a dependence of Tc in Figure 8. Sec Figure 16 for a 
dependence of go, R and Aeg. In the IRSB solution, there exist two branches I and 
II. In the branch I the quantities agree to those in the RS solution with T = as 
a ocs{T = 0). On the other hand, q and R tend to 1 as a tends to in the branch II. 

Within our calculation, it is difficult to determine which case of (a)0 < // < 1 or 
(b) /i > 1 takes place in the branch II, since wc could obtain solutions only for a > 0.45. 
As for Tc it seems that T^. finite as a ^ 0. In both branches, Ai and A3 are negative, 
that is, AT stable. As for the free energy, /^^^ < filsB holds. See Figure 9. 

5.2. Asymptotic behaviors 

As is suggested from the above numerical results and will be shown later, asymptotic 
behaviors of (3c are different in the cases of (5 = and (5 > 0. Thus, we discuss these 
cases separately. 

(I) The case of 5 > 

Suggested by numerical results we consider the hmit /3c <^ 1. For /3 <^ 1, fRS and 
Srs are estimated in the asymptotic region as follows. See Appendix E. 

-l3fRS= -\{l-q)-RR + I 
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Srs = - lAq-RR + I- aP'^^. (52) 

First, let us show that for < /i < 1, no consistent IRSB solution exists in the present 
situation. The saddle point equations are 

A, . HMim, (53) 

Vqc 



AR ^ ^, (54) 



Qc = qoap'^/sjAq, (55) 
R, =Roa(3,{ARf-'^/^, (56) 

^0 = ;=j = — ;=2^^^, 

27rV2 A 



where Aq = 1 — q^ and Ai? = 1 — R^. Then, the condition that the entropy is zero 
becomes 

^^^m^^ (57) 

Rt 

From equations (0) and (|57D, we obtain = 2. Since we consider the case of i? ~ 1 
and g ~ 1, r should be large. Thus, this case is inadequate. Therefore, in the below, 
we consider the case of 1 < /i. In this case, / ~ i? + a^e~'^^^~^\ Then, /^^ and Spts 
become 

-(3Jns ^ - |Ag + R^AR + a,e-2(^-^=) - a/?,(e, - -^V^), 



^Ag + 4Ai?-^^, 
2 ^ 2-kV2 

The saddle point equations for q and are the same as in the case (a), and those for q, 
R and zero entropy condition are 

AR = 2a^e-2(^--'?"-), (58) 
Ag =2Ai?, (59) 

Sns - - |Ag + R^AR - a(3%^ + a^e-2(^-'^=) = 0. (60) 

Since /2c 3> 1 and (jc^ I, using equations (|55|), (|58D and (^) , we obtain from equation 
( |60|) the following relation, 

Rc = Sqc. (61) 



That is, /i = 3/2 and = 1. Thus, from equations (0), (0) and (|6TD, we obtain 
that is. 

In gc = In a - 2(25 - l)gc + In Fq, 
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where Fq = /9^/2. Thus, 

In a 
~ 2(25 - 1) 

for 25 — 1 7^ 0. This imphes that tends to infinity under the condition that a tends 
to infinity for 5 > 1/2 or a tends to for 5 < 1/2. For 6 — 1/2, Qc — Foa. Therefore, 
we obtain the following results. 

(i) In the case of 5 > |, as a — > oo. 

In C( 2 

Ai?~2( )^, Aqc^2AR, (62) 

a 

A 3 , / a X ^ A , ^ 4:^y^^s c , In a , s . . 

^-2(2?3T)Mi^). 9c-fic/3. p^^^A—)^, (63) 

Ae, ^ e.(Aii)^ ^ e.2^(!;i^)«i, e. = —^2^. (64) 

a (l + ())v27r 

(ii) In the case of 5 = |, as a ^ oo, 

Ai?~2e-^^«", Ag~2Ai?, (65) 
4-3FoQ;, ?c-4/3, /3c ^ y V^e-^°", (66) 
Aeg ~ eo2te-^-^°". (67) 

(iii) In the case of < 5 < |, as a — 0, 

2 

Ai?~2(-^)T^, Ag~2Ai?, (68) 



a 



5 



A.,^.o2*(i^)ia. (70) 

a 

Thus, when < S < 1/2, for large a, there is no solution such that Qc I and 
Rc ^ 1. This implies that there is a value of a = amax such that for a > oimax there is 
no solution except for the PL when < 5 < 1/2. 

Here, let us compare the theoretical results with numerical ones on asymptotic 
behaviours for 5=1. As is shown in the Table I, we note that numerically obtained 
exponents 62 and theoretical ones 62,4/1 agree fairly well except for those of q and R. q 
and R are proportional to ln( j^) and it is difficult to estimate a logarithmic dependence 
directly. Instead, we can check the theoretical result of the relation "1 = 3. Numerically, 
this value is 2.5. Further, as for the relation ^ = 2, we obtained numerically 1.7. 
Therefore, we can conclude that the agreement between theoretical and numerical results 
are fairly well. 

Now, let us examine the case 5 = 0. In this case, substituting 5 = into the above 
expressions for the case of < 5 < 1/2, j3c becomes constant. That is, the assumption 
/3c -C 1 is not satisfied. This is the reason why we treat the cases 5 > and 5 = 
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separately. 

(II) The case of 5 = 0. 

In §3, we examined the RS solution with (3 fixed. Now, let us investigate the IRSB 
solution by imposing the condition Srs = 0. 

(a) < /i < 1 

In this case, Srs becomes 

^ qAq /—— 
^RS - + ^y^Q r. 

Then, from the condition Srs = we obtain 

^7i,o(x,/5c) = -2r(x,/5c). (71) 

From equations ( p^ and ([TTD , x and Pc are determined. The IRSB solution appears for 

/3>/3c- 

{h) fi>l 

Srs is 

qAq RAq r— 
Srs ^ Y~ 2 "V^^ 

Then, the condition Srs = becomes 

giflix = 0,Pc) = g2flix = 0,Pc) + 2r(x = 0,/3,). (72) 
If the solution Pc of equation (^) exists, IRSB solution appears for /? > /?c- The 
condition of the existence of this type of solution is 5'2,o/fi'i,o > 2. 

Numerical calculations for S = indicate that x tends to constant and then the case 
(a) appears. Thus, for a ^ 1 

AR ~ -^ij(fic)a^(ln -)"^ Aq = 2ficAR, (73) 
V27r a 



R^ ^-ln[i(lnVl, Qc=^, (74) 

Ae, = eoJ%^a(lnl)-. (75) 
V ZTT a 

This implies that for 6 = 0, the PL takes place. 

As for the validity of the above asymptotic solutions for any 5 > 0, since the 
coefficient of any quantity does not contain f3, there is no condition for the range of /3. 

Thus, in the case of < 5 < |, there exists no solution for a > amax- This is con- 
sistent with the result derived in §4 that the PL exists for < 5 < ^ when < P < oo. 



Putting together all results obtained in this paper, we get the following behaviours 
of learning. 

When T is small enough, there is a critical value of a, as{T) above which the 
entropy of the RS solution becomes negative. Thus, for a > a<i(T), the IRSB solution 
appears. Within the IRSB ansatz, we found that the behaviour of the generalization 
error eg is classified into the following three categories according to the value of 6. 
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(i) If < 5 < |, solutions with R < 1 exist only for a finite range of a, [0,amax]- 
There is a critical temperature T^. When T > T^, the entropy of the RS solution 
is positive and this solution is AT stable. When T < T^, for a > as{T), the IRSB 
solution appears. In both cases, at a = amax, a first-order phase transition from 
the RS solution with positive entropy or from the IRSB solution to the Perfect 
Learning takes place. 

(ii) If (5 = i, as{T) is defined for any temperature T, and IRSB solution appears for 
a > as{T). eg for the IRSB solution decays exponentially, 

where Fq is a constant. 

(iii) If 5 > |, for any temperature T, «s(T) is defined and IRSB solution appears for 
a > as(T). Eg for the IRSB solution decays as a power law with a logarithmic 
correction, 

, In a , 1+6 

Aeg oc 

a 

To check these theoretical results, we performed numerical calculations. In the next 
section, we give the results of the calculations. 

6. Numerical Calculations by exhaustive method 

We performed numerical calculations by the exhaustive method for 6 = and 6=1. We 
used the minimum-error algorithm and the Gibbs algorithm for several temperatures. 
We calculated several quautities such as q, R, eg, etc. For example, q and its standard 
deviation 6q are calculated by the following formulas. 

? = 1^ E = E Q'^'^PaPp/ E PcPfi, (76) 



= ^[E^i - {j:<ii)ym/{N, - 1), (77) 

Pa^e-^'="/J2e-^'^", (78) 

a 

where a denotes one of 2^ configurations of weight vectors and is its energy, 
is the thermal average for a given example ^ and A^^ is the number of samples. The 
calculations were performed for N up to 20 with = 200. First, let us show the results 
for 6 = 1. 
{1)6=1 

The i-th component of an example Xi is corrupted by a gaussian noise rji with mean 
and standard deviation 1. This corresponds to P{y) = 1 — 2H[y). First, wc show 
the results for the minimum-error algorithm. In Figure 10, to see the system size (A^) 
dependence of quantities, we show the a dependence of R and its standard deviation 
6R for A^ = 10, 15 and 17. From these results, it seems that the calculations in A^ = 15 
is sufficient to obtain N = oo results at least for a up to 15. In Figure 11, we compare 
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numerical results with theoretical ones. The agreement between the theoretical and 
numerical results is fairly well except for go- As for go? we calculate it in each sample by 
the formula (|7|). In general, go exhibits a large finite size effect, because it is calculated 
for pair of states. Thus, when the number of states with the minimum energy becomes 
small, the fluctuation of go becomes very large. This is the reason why the agreement 
between the numerical and the theoritical results for go is worse than other quantities. 
A suitable quantity is the distribution of g, -P(g). We calculate this in the case of the 
Gibbs Algorithm. In Figure 12 we show the a dependence of R for larger value of a in 
the case of = 15. At a > 85, there exists only one state. Since theoretically R tends 
to 1 as a goes to oo, to see whether this is a finite size effect or not we estimated the 
value of Umax by the condition that R exceeds 1 — 1/A^ for the first time as a increases. 
Then, we found that as N increases Umax increases. Thus, it seems that we observed 
the finite size effect. 

Next, we show the results for the Gibbs algorithm. In this algorithm, we calculated 
for = 10 and 12 and for several temperatures, and we took into account the all staes. 
We confirmed that the results for A^ = 10 and 12 are almost same. We show both 
numerical and theoretical results in Figure 13 for R and Ae^ and in Figure 14 for g and 
go. The distribution of g, -P(g), is also shown for several temperatures together with 
theoretical results in Figure 15. -P(g) is calculated by 

P(g) =<Y.5{q,q-^)P^Pp>, 

a,l3 

where S{q, g"^) is the Kronecker's delta and < • > means the sample average. From 
these, we see the agreement between theoretical and numerical results is fairly well. 
(II) 6 = 

The output by a teacher is reversed with the probability (1 — A;)/2 with k = ^, that 
is we treat the same case as before. First, we show the results for the minimum-error 
algorithm. We investigated the A^ dependence of R{a) and 6R{a) for A^ = 10, 15, 17 
and 20 and found that the results for A^ = 15, 17 and 20 are almost same. In Figure 16, 
a dependence of several quantities are depicted for A^ = 17 together with theroretical 
results. In the figure of g and go(Figure 16(b)), we note that go takes values at above 
the theoretical upper bound of a, amax- This is due to a finite size effect mentioned in 
the above. In Figure 17 we show the behaviour of R for larger value of a in the case of 
A^ = 15. For a > 22, there exists only one state. In the case of A^ = 10, it occurs for 
a > 25. To investigate whether the PL exists or not, we numerically estimated amax 
by the same method as that in the case of 6 = 1. Contrary to the case of 6 = 1, amax 
decreases as A^ increases. Thus, we conclude the PL takes place even when N = oo. 
Theoretically, amax is about 9.13. 

Next, we show the results for the Gibbs algorithm. In this algorithm, we calculated 
for A^ = 10 and 12, and for several temperatures. We took into account the all staes. 
We confirmed that A^ = 12 is sufficient for convergence for any T. In Figure 18, a 
dependence of R and Ae^ are shown for T = 1.0. Also, a dependence of g and go 
are depicted in Figure 19 for T = 0.15,0.5 and 5.0. As is shown in these figures. 
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the agreement between the theoretical values and numecical ones are fairly well except 
for the case of Qq. Concerning qq, we further calculated P(g) at a = 5 for several 
temperatures. In Figure 20, we show the numerical results of P{q) together with 
theoretical ones. The positions of peak values agree in the theoretical and numerical 
results. 

In conclusion, in both cases oi S — 1 and 0, although there exist finite size effects as 
is seen for qq, as a whole theoretical results and numerical ones agree fairly well within 
the IRSB ansats. 

7. Summary and Discussion 

In this paper, we studied the learning from stochastic examples by pcrceptrons with 
Ising weights. By using the replica method, we obtained the condition for the existence 
of the Perfect learning and power law of learning curves in the asymptotic region, in 
terms of 5 which represents the local property of the rules by which examples are drawn. 
First, let us summarize the results in more details. 

Our assumptions are as follows. 
When an input vector x is given, the probability Pr{+l\x) that a teacher returns an 
output +1 is a function of the inner product between the input x and the teacher's 
weight and take the following form, 

p,(+l|x)=7'K) = i±|^, 

u° ={x-w")/VN, \x\ = y/N, \w"\ = VN. 

Further, we assume P{y) is non-decreasing and near y = it behaves as P{y) — 
a sgn{y)\y\^ , {6 > 0). For simplicity, we assumed that P{y) is an odd function. 
Under the above assumptions, we obtained the following results. 
Conditions for the PL 

The the necessary and sufficient conditions for the existence of the Perfect Learning 

are 

(a) <5 <l/2 when < /3 < oo, 

(b) deterministic case. 
Behaviour of learning curves 

Within the IRSB ansatz, we found that the behaviour of the generalization error 
Sg is classified into the following three categories according to the value of S. 

(i) If < 5 < |, at a = amax, a first-order phase transition from the RS solution with 
positive entropy or from the IRSB solution to the Perfect Learning takes place. 

(ii) If 5 = |, for large a IRSB solution appears and Eg for the IRSB solution decays 
exponentially, 

Asg ~ e-3^°", 
where Fq is a constant. 
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(iii) If (5 > |, for large a IRSB solution appears and eg for the IRSB solutfon decays as 
a power law with a logarithmic correction, 

, In a , 1+^ 
Acg oc 2^-1. 

a 

To check these results, we performed several numerical calculations. Those are, the 
calculations of the saddle point equations for the RS and the IRSB solutions and the 
direct calculations of concerned quantities by enumeration methods. The numerical and 
theoretical results showed fairly well agreement. As is mentioned in the introduction, 
Seung also investigated the existence of the Perfect Learning when the weights are Ising 
and a rule to be learnt is stochastic []13[ by the annealed approximation. He classified 
learning behaviour of Ising networks by introducing the following two exponents y and 
z. The first exponent y is associated with p(eg) which is the logarithm of the number 
of weight vectors whose generalization errors have a value eg. He assumed that when 
Aeg = eg — emin is small, p{eg) increases as p{eg) ~ 0((Aeg)^), where e^m is the minimum 
value of the generalization error obtained by the unique optimal weight vector w°. The 
second exponent z is introduced to characterize ed{w, w°) which is the probability that 
the output for the weight vector w differs from that for the optimal weight vector w". 
He also assumed that ediWjW") is scaled as ed{w,w") ~ 0{{Aeg)^). He estimated 
the upper bounds for the generalization errors and found that the behavior of learning 
curves varies according to the values of indices y and z. His results are summarized as 
follows. 

(i) U y + z > 2, there is a first-order transition. 

(ii) If ?/ + z < 2, the generalization error decays as a power law, Ae^ ~ a~^-y-^ . 

(iii) U y + z = 2, there is a second-order transition or the generalization error decays 
exponentially. 

In our model, the exponents y and z are expressed a.sy = -j^, z = = | respectively, 
and then v = -rfr- Therefore, 6 = follows and it is found that our results on the 
typical learning behavior agrees with Seung's results which are the upper bounds of the 
learning curves. 

As for the condition of the existence of the PL, we note that for (3 = oo, i.e. T = 0, 
the PL does not exist in the learning from stochastic examples. The reason is that 
for T = and for large a there exists no student whose outputs are the same as the 
teacher's, since the teacher makes mistakes. Thus, the volume of weight vectors whose 
energies are vanishes for large a. On the other hand, for the case of T — > +0, we 
consider the weight vectors of the minimum energy, and there is at least one solution of 
w = w° when a is large enough. Thus, the PL is possible in the limit T ^ 0. 

As the learning advances, w tends to w". The examples which give the crucial 
influence on the learning are such that uq = {x ■ w°) / \/N ~ 0. The more slowly the 
probability V{u) varies around u = for larger 5, the more difficult students tune the 
optimal vector w°. This is the reason why the larger 5 is, the Perfect Learning becomes 
more difficult to take place. 
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Appendix A. Derivation of free energy 

Here, we derive the free energy by the replica method. Introducing n replicas, the 
partition function becomes 

= Tr n [e-^ / rfA^ + / ^A^] / ^ exp[-iy^(rX - A^)], (A-l) 

J—oo JO J—oo ZTT 



where -u^ = (x^ ■ w°')/yN and Tr implies the summation over all configurations of 
w'^^a = 1, ■ ■ - n. Defining the overlap between the weight vector of a learner and the 
optimal weight vector, R"' = Sj^i ' ^^"^ overlap between the weight vectors 

of learners, q""^ = jj YJj=i wfWj, and using the relations 

j=l J-ioo ZTTZ 

/fioo Mrln^l^ 1 ^ 

n / . ^ eM-Nr%-^ - ^ E )], 

a</3 j=l 

we take the average over r°, and x'^, and obtain the expression for < Z" >^p,w°, 

< = JiUdq<^'^mdR"^]e''^, (A.2) 



/O roo fc 

dx-+ dX-) 
-oo JO J —( 



a a</3 

dr 

2n ^ 



X exp[- - E rtV + iE?/"Ai*(Ey"^"), (A.4) 

a a<l3 ce a 

= Tr expE ^"w" + E Q^'^w'^w^ (A.5) 

a a<l3 



/27r 

where Tr implies the summation over w", a = 1, ■ ■ -n. In the above expressions we set 
— 1/2 which is the optimal value. When P{—y) — —P{y), ^(y) becomes 

= / d^e-'^^^-'y^'2V{-0. (A.6) 
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The general form of the free energy per weight is given by 
_ <hiZ>5^ _ 



Appendix B. Derivation of asymptotic relations for q and R when q —>■ 1 

and i? — i> 1 

In this appendix, we briefly derive the asymptotic relations for q and R. 

First, let us consider the case of < /3 < oo. The equation (|^) for q is rewritten 
as follows. 

A= [due-^'-'/'i^, (B.2) 
B = -==—-= / DzP{ez) / DtH^Ht + -W— ^^)],(B.3) 



where if2(M) = - h{-uY ' = "7^+2? ^"^"^ ^ " V%S~- follows that iJ2(M) is 
strictly increaseng odd function and < \H2{u)\ < e^^ — 1 for m 7^ 0. Thus, for 5 > 0, 



P{ez) can be replaced by a{ez) in the equation ( p.3| ). 

5 ~ / Z}2a(£2)^ / DtH^Ht + ^2)] + 
Z^Tx Jo J-00 v2x 

As q and i? tend to 1, e tends to and then B ^ 0. On the other hand, for T > A is 
finite for these limits. Thus, A — B c:^ A. Therefore, we obtain for 6 > 0, 

(B.4) 



g,^six.P) = ^ I du^iu)\ (B.5) 
V 27r 



where Ag = 1 — g. For the case of 5 = 0, from equation ( |ij.3| ), we obtain 

B ~ r Dzk r DtH2\ , ^ ^(t + -^z)] = k I du^S^\l - 2H(u/x)], 

where k = limy^+Q P{y). Then, we obtain 

q ~-^(7i,o(x,/5), (B.6) 

^7i,o(x, P) = ^ f du^iufll -k + 2kH{u/x)]. (B.7) 
y/ Zn -J 

Now, let us estimate R. The equation (^) is rewritten as 

. ^ ^(l^n, (B,8) 

q^ V2tt 
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D = I rfMe-«'"'/2 = u r DxHiiux) H DyP{^y)e-''^''y{y + r/x), (B.9) 

J H(u) J-'^ Jo 



1 1 ^ / 1 - 



H{x) Hi-xY v^FW ' "Ve + Q^ 

For 6 > 0, D is calculated as 



where ( = ^J^^^ and il){z) = DtHi{vt — z). Since ip{z) is bounded, D is evaluated 
as follows. 

D^i\\ Dza5{CzY-'^{^==^) + 0{Hil/C)}] ^ ^a5C'-' / 
That is, we obtain 

R^a-j^g2,s{x,P), (B.IO) 
92AX, f3) - ^(1 - e-0-(l + x-'f-'^^' r /^./-V(^^)(B.ll) 



For (5 = 0, from equation ( 5.9|) , we obtain 

-OO v2vr "'0 

We assume that \P'{ii)\ is bounded. || 

Then the second term in the parenthesis is evaluated as and is neglected. 

Then, 

Dc^J^r DxHAux) = , r Dx^ ^ 



Thus, we obtain 



i?~a^^2,5, (B.12) 

Now, let us consider the case of /9 = oo. In this case, (p{u) becomes = -f^- 
Therefore, from the equation (Jl^) we obtain 



C POO 

\/2txO^ Jo 



X {2^1 - eey + V2^r + (i - e)ey']e-^'[i - 2h{^i - ey)]}- (b.m) 

§ Here, we assumed the boundedness of |P'(y)|. However, we can obtain the same resuhs as those 
obtained here without using P'{y). 
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In the expression ( [B.14| ), the first term in the parenthesis is 0{^^^^ /Q'^) and the second 
term is evaluated as ^ f^ DyP{y)y'^. Thus, A - B' ^ ^ f^ Dy{l - P{y))y'^. Then, 

If P{u) = 1, that is in the deterministic case, this gives 0. Thus, we discuss this case 
later. Similarly, from equation (|13|) for R we obtain 



{^\ DyP{iy)y-i^^^ \ DyP{y){\ - y'){\ - 2HQL^y)^^^ 
^J^ DyP{y){y^-l)^—g,. (B.16) 



a 



If P{y) is not constant for ?/ > 0, the integration is positive. If P{y) is constant for 
2/ > , which can happen when 5 = 0, the integration is 0. The latter case, we can 
perform the exact calculation and obtain 



r DyP{^y)F{y) = M'^-e 

Jo TT 



10 TT TT 

where k = P{y) for y > 0. Therefore, 

(B.17) 

Finally, let us consider the deterministic case. In this case, 5 = and k = 1 and q = R 

and q = R hold. Then, we obtain = ^ and E{u/Q) = 2H{u/Q). Thus, equation 

( |T^) becomes 

2a f ^ hiu/QY « 2 /■ ^ hiu) a 



As for R, since v = —u, from equation (|T2D we obtain exactly 

R = - Du =g. B.19 

1- q J H[u/Q) 

Appendix C. Asymptotic form of I for r ^ 1 and J? ^ 1 

I is expressed as 

/ = /Dtln[2cosh(v^t + ^)] + ^ + (C.l) 

J ^ TH 

If = r Dt\Yi[l + e-2v^(*^")] = V2^h{T) r Dxe^""^ ln(l + e'^v^^), 

J±T Jo 

where r = R/ ^/q and = ||- The following relations are proved mathematically exactly, 

oo ( 1 \n~l , „ 

n=l ^ 1^ 
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For < /i < 1, in equations ( |C'.2| ), H{t{^ — 1)) and H{t(J^ + 1)) are approximated by 
'^'^l'^n_l^^ and -^^^^^jy-, respectively. Then, Ii is estimated as 

n=l V / ^ fi ' 

where c(/i) = E^=i ^"^A" ' i_(i/n')2 ■ When /i is not an integer, 



c(/i) 



1— (^t/n)2 
TT 1 



2/isin(yU7r) 2/^^ 



Thus, 



/^f?+^[l + 2A(/^)]=^+^^^M^ forO</.<l, 
r/i r/i 



sin(7r/i) 

When = 0, c(0) = 7rVl2 and V^(0) = 1. Then, 

/ ~ ^ + for ~ 0. 
r/i 

For /i > 1, If is expressed as 

n=l ^ ^ T n=no+l ^ fi ^ 

where Uq = [/i], i.e., Uq is the largest integer which does not exceed the value of /i. Let 
us compare terms in equation ( |C3| ). Let us assume 1 < fii < n2 < Uq. Then, 

g2(jn,J-2Rni j ^2qn\-2Rn2 _ g4(j(n2-ni)(/i-(n2-ni)/2) ^ g4(}(n2-ni)(?io-ni) 

Thus, we obtain 

g ^ 1 is satisfied when _R ^ 1 as long as /x = ^ is bounded from the above. Further, 

since e~'^^/^/e^'^"'^~^^" = e"^'^'-""^-'^, each term in the first summation in equation ( p.3| ) 
is lower order than /i(r)/r for r ^ 1. Thus, and the second term in /f are higher 
order than terms in the first term in Jf. Then, for 1 < < 2, 



Thus, 



r 

where c(/i) = c~(/i) + c+(/i). Therefore, 

/^f? + e-(^-) + ^^^^forl</.<2. 
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For II — 1, we obtain 

T 

where C2(/x) is defined as 



2)" ^2 ^ ^-1 2(1-/.) 
Thus, 

J ~ i? + er^(^-^V2 for /i = 1. 

We use the facts that C2(//) is analytic at = 1 and 02(1) = 0. 
For II > 2, 



then, 



Thus, 



I- ~ e-2(^-^) - ie-^(^-2^)i/[T(- - 1)], 
2 ^ 



6^ = lfor/x>2, 62 = 1/2. 



Ic^R + e-2(^-^) - ^6^e-"(^-2^) for > 2. 



In summary, up tp the second order term in /, we obtain 

7 ~ ^ + a^e-2(-^-«) for 1 < 11, (C.5) 

^(u) = 1 + 2u2c(u) = ai = 1/2, a„ = 1 for u > 1. (C.6) 

sin(7r//) 

Appendix D. SpL = and fpL = aemin 

As is shown in equation (3.84), when g — > 1 and i?— >lforO</3<oo and for any S 
and any x, jS'ijs is expressed as 

Srs^ - ^-(l-AR) R + a^qrix,P)+I. (D.l) 

We consider the case of < /5 < 00 and < 5 < 1/2. 
For the PL, R = q = l,x = ^ and ^ = Q = V ^Q- Then, 

q ^^^9iAx = l:P): (D.2) 

^ = a(Ag)(^-^)/V(x=l,/3). (D.3) 

For < /3 < 00 and 5 > 0, ^1,5(1, /3) and ^2,5(1, /3) are finite. Then, A* = § = ^^(Ag)'^/^ 
becomes 

~ for 5 > 0, 
/i = ^^''^ = finite for 5 = 0. 

2^1,5 
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Let us estimate /. In the case of 5 > 0, since /i = we obtain from ( P'.4D 



R+--^ = R+—h{T) 



Then, 



■^5 

S = --gisaJ Aq + g2,5a{Aq)^ + -^h{T) + arJ Aq. 

2 V -pj V 

From the equations ( p.2|) and ( [D.3|) , we obtain 

q'~'R = {ag^,5Y~^ag2,5 = C. 

Then, 

1 r J? 1 4(5-1 

R = Cq^-\ =C^ — rT=2-6_ 

Since Ag = 0, r = cxd and gi,s,g2,s, r and C are finite, we obtain 5* = 0. For the case 
of 5 = 0, we have to determine the value of fi. As is discussed in §3, when x is finite, 
fi = from equation ([361) . In our case, x = 1 and then /x = 1/2. Thus, 

r 

By a similar argument to the case of 5 > 0, we obtain S = 0. Now, let us estimate 
< Ct >= —ae~^J. J is estimated as 

J H{u/Q) Jo 

Thus, < et >= —ae~^J = atmin- Then, we obtain fp^ =< Ct > —TSpL = aemin- 

Appendix E. Asymptotic form of fps and Srs for /3 ^ 1. 

In this appendix, we derive the asymptotic forms of the free energy and the entropy for 
the RS solution for /? <^ 1. 
fus is expressed as 

- PfRs{q. q, R, R, P) = -^{l~q)~RR + aK + I, (E.l) 

K = J Dy2V{y) J Du\nH{Y), 1 = j Dt\n.[2cosh{^qt + R)], 

9u-Ry 

By defining Ka and Kf, as follows. 



where Y = V^^-^y , 



Ka= J Dy2V{y) J DuH{-Y) = e„,„ + 2 DyP{y)H{ 
Kb = J Dy2V{y) J DuH{Y)H{-Y) = Q J DuH{u)H{-u) 



Ry , 
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K and fus are expressed as 

-PfRS ^ - \{l-q)-RR + I- a(5{K, - ^K,) + 0{(5'') 

~ -'i{l-q)-RR + I-a(3eg + ^K,. 
The entropy Srs is expressed as 



Sjis= - ^{l - q) - RR + I + aK - ape-^J, (E.2) 

HiY)-l 



J Dy2V{y) J Du- 



H{Y) 

Then, defining L as L = K — (3e~'^J, we get 



?2 



Thus, we obtain 

Sns = -|Ag -RR + I- ^K, + 0{(5^). 
For Ag < 1 and Ai? < 1, 

Then, fug and S'ijs are expressed as 

-/Jfe = - |(1 - - flfl + / - c^lw + ^^-|-^(2Afl)* - ^1. (E.4) 

Shs = - |Ag - flfl + / - (E.5) 
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